Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 9.467
Filter
1.
J Acoust Soc Am ; 155(4): 2603-2611, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38629881

ABSTRACT

Open science practices have led to an increase in available speech datasets for researchers interested in acoustic analysis. Accurate evaluation of these databases frequently requires manual or semi-automated analysis. The time-intensive nature of these analyses makes them ideally suited for research assistants in laboratories focused on speech and voice production. However, the completion of high-quality, consistent, and reliable analyses requires clear rules and guidelines for all research assistants to follow. This tutorial will provide information on training and mentoring research assistants to complete these analyses, covering areas including RA training, ongoing data analysis monitoring, and documentation needed for reliable and re-creatable findings.


Subject(s)
Voice Disorders , Voice , Humans , Acoustics , Speech
2.
Sci Rep ; 14(1): 8977, 2024 04 18.
Article in English | MEDLINE | ID: mdl-38637516

ABSTRACT

Why do we prefer some singers to others? We investigated how much singing voice preferences can be traced back to objective features of the stimuli. To do so, we asked participants to rate short excerpts of singing performances in terms of how much they liked them as well as in terms of 10 perceptual attributes (e.g.: pitch accuracy, tempo, breathiness). We modeled liking ratings based on these perceptual ratings, as well as based on acoustic features and low-level features derived from Music Information Retrieval (MIR). Mean liking ratings for each stimulus were highly correlated between Experiments 1 (online, US-based participants) and 2 (in the lab, German participants), suggesting a role for attributes of the stimuli in grounding average preferences. We show that acoustic and MIR features barely explain any variance in liking ratings; in contrast, perceptual features of the voices achieved around 43% of prediction. Inter-rater agreement in liking and perceptual ratings was low, indicating substantial (and unsurprising) individual differences in participants' preferences and perception of the stimuli. Our results indicate that singing voice preferences are not grounded in acoustic attributes of the voices per se, but in how these features are perceptually interpreted by listeners.


Subject(s)
Music , Singing , Voice , Humans , Voice Quality , Acoustics
3.
PLoS One ; 19(4): e0301336, 2024.
Article in English | MEDLINE | ID: mdl-38625932

ABSTRACT

Recognizing the real emotion of humans is considered the most essential task for any customer feedback or medical applications. There are many methods available to recognize the type of emotion from speech signal by extracting frequency, pitch, and other dominant features. These features are used to train various models to auto-detect various human emotions. We cannot completely rely on the features of speech signals to detect the emotion, for instance, a customer is angry but still, he is speaking at a low voice (frequency components) which will eventually lead to wrong predictions. Even a video-based emotion detection system can be fooled by false facial expressions for various emotions. To rectify this issue, we need to make a parallel model that will train on textual data and make predictions based on the words present in the text. The model will then classify the type of emotions using more comprehensive information, thus making it a more robust model. To address this issue, we have tested four text-based classification models to classify the emotions of a customer. We examined the text-based models and compared their results which showed that the modified Encoder decoder model with attention mechanism trained on textual data achieved an accuracy of 93.5%. This research highlights the pressing need for more robust emotion recognition systems and underscores the potential of transfer models with attention mechanisms to significantly improve feedback management processes and the medical applications.


Subject(s)
Emotions , Voice , Male , Humans , Speech , Linguistics , Recognition, Psychology
4.
Sensors (Basel) ; 24(7)2024 Mar 22.
Article in English | MEDLINE | ID: mdl-38610256

ABSTRACT

The ongoing biodiversity crisis, driven by factors such as land-use change and global warming, emphasizes the need for effective ecological monitoring methods. Acoustic monitoring of biodiversity has emerged as an important monitoring tool. Detecting human voices in soundscape monitoring projects is useful both for analyzing human disturbance and for privacy filtering. Despite significant strides in deep learning in recent years, the deployment of large neural networks on compact devices poses challenges due to memory and latency constraints. Our approach focuses on leveraging knowledge distillation techniques to design efficient, lightweight student models for speech detection in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to create compact yet effective student architectures to compare against the larger EcoVAD teacher model, a well-regarded voice detection architecture in eco-acoustic monitoring. The comparative analysis included examining various configurations of the MobileNetV3-Small-Pi-derived student models to identify optimal performance. Additionally, a thorough evaluation of different distillation techniques was conducted to ascertain the most effective method for model selection. Our findings revealed that the distilled models exhibited comparable performance to the EcoVAD teacher model, indicating a promising approach to overcoming computational barriers for real-time ecological monitoring.


Subject(s)
Speech , Voice , Humans , Acoustics , Biodiversity , Knowledge
5.
Sci Rep ; 14(1): 9297, 2024 04 23.
Article in English | MEDLINE | ID: mdl-38654036

ABSTRACT

Voice change is often the first sign of laryngeal cancer, leading to diagnosis through hospital laryngoscopy. Screening for laryngeal cancer solely based on voice could enhance early detection. However, identifying voice indicators specific to laryngeal cancer is challenging, especially when differentiating it from other laryngeal ailments. This study presents an artificial intelligence model designed to distinguish between healthy voices, laryngeal cancer voices, and those of the other laryngeal conditions. We gathered voice samples of individuals with laryngeal cancer, vocal cord paralysis, benign mucosal diseases, and healthy participants. Comprehensive testing was conducted to determine the best mel-frequency cepstral coefficient conversion and machine learning techniques, with results analyzed in-depth. In our tests, laryngeal diseases distinguishing from healthy voices achieved an accuracy of 0.85-0.97. However, when multiclass classification, accuracy ranged from 0.75 to 0.83. These findings highlight the challenges of artificial intelligence-driven voice-based diagnosis due to overlaps with benign conditions but also underscore its potential.


Subject(s)
Artificial Intelligence , Laryngeal Neoplasms , Vocal Cord Paralysis , Humans , Laryngeal Neoplasms/diagnosis , Vocal Cord Paralysis/diagnosis , Vocal Cord Paralysis/physiopathology , Male , Middle Aged , Female , Aged , Adult , Voice/physiology , Laryngeal Diseases/diagnosis , Laryngeal Diseases/classification , Machine Learning , Laryngoscopy/methods
6.
Sensors (Basel) ; 24(5)2024 Feb 25.
Article in English | MEDLINE | ID: mdl-38475029

ABSTRACT

In recent years, there has been a notable rise in the number of patients afflicted with laryngeal diseases, including cancer, trauma, and other ailments leading to voice loss. Currently, the market is witnessing a pressing demand for medical and healthcare products designed to assist individuals with voice defects, prompting the invention of the artificial throat (AT). This user-friendly device eliminates the need for complex procedures like phonation reconstruction surgery. Therefore, in this review, we will initially give a careful introduction to the intelligent AT, which can act not only as a sound sensor but also as a thin-film sound emitter. Then, the sensing principle to detect sound will be discussed carefully, including capacitive, piezoelectric, electromagnetic, and piezoresistive components employed in the realm of sound sensing. Following this, the development of thermoacoustic theory and different materials made of sound emitters will also be analyzed. After that, various algorithms utilized by the intelligent AT for speech pattern recognition will be reviewed, including some classical algorithms and neural network algorithms. Finally, the outlook, challenge, and conclusion of the intelligent AT will be stated. The intelligent AT presents clear advantages for patients with voice impairments, demonstrating significant social values.


Subject(s)
Pharynx , Voice , Humans , Sound , Algorithms , Neural Networks, Computer
7.
JAMA ; 331(15): 1259-1261, 2024 04 16.
Article in English | MEDLINE | ID: mdl-38517420

ABSTRACT

In this Medical News article, Edward Chang, MD, chair of the department of neurological surgery at the University of California, San Francisco Weill Institute for Neurosciences joins JAMA Editor in Chief Kirsten Bibbins-Domingo, PhD, MD, MAS, to discuss the potential for AI to revolutionize communication for those unable to speak due to aphasia.


Subject(s)
Aphasia , Artificial Intelligence , 60453 , Speech , Voice , Humans , Speech/physiology , Voice/physiology , Voice Quality , Aphasia/etiology , Aphasia/therapy , Equipment and Supplies
8.
JASA Express Lett ; 4(3)2024 03 01.
Article in English | MEDLINE | ID: mdl-38426889

ABSTRACT

The discovery that listeners more accurately identify words repeated in the same voice than in a different voice has had an enormous influence on models of representation and speech perception. Widely replicated in English, we understand little about whether and how this effect generalizes across languages. In a continuous recognition memory study with Hindi speakers and listeners (N = 178), we replicated the talker-specificity effect for accuracy-based measures (hit rate and D'), and found the latency advantage to be marginal (p = 0.06). These data help us better understand talker-specificity effects cross-linguistically and highlight the importance of expanding work to less studied languages.


Subject(s)
Speech Perception , Voice , Humans , Language , Recognition, Psychology
9.
J Dr Nurs Pract ; 17(1): 3-10, 2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38538113

ABSTRACT

Background: Many health professionals report feeling uncomfortable talking with patients who hear voices. Patients who hear voices report feeling a lack of support and empathy from emergency nurses. A local emergency department reported a need for training for nurses in the care of behavioral health patients. Objective: The aim of this study is to implement a quality improvement project using a hearing voices simulation. Empathy was measured using the Toronto Empathy Questionnaire, and a post-intervention survey was used to evaluate emergency nurses' perception of the professional development session. Methods: The quality improvement project included the implementation of a hearing voices simulation with emergency nurses. A paired t-test was used to determine the differences in the nurses empathy levels pre-and post-simulation. Qualitative data was collected on the nurses' experience during the simulation debriefing. A Likert-style questionnaire was used to collect data on the nurses' evaluation of the simulation. Results: The results of the hearing voices simulation were a statistically significant increase (p < .00) in empathy from baseline (M = 47.95, SD = 6.55) to post-intervention empathy scores (M = 48.93, SD = 6.89). The results of the post-simulation survey indicated that nurses felt that the hearing voices simulation was useful (n = 100; 98%) and helped them to feel more empathetic toward patients who hear voices (n = 98; 96%). Conclusions: Using a hearing voices simulation may help emergency nurses feel more empathetic toward the behavioral health patients who hear voices. Implications for Nursing: Through the implementation of a hearing voices simulation, clinical staff educators can provide support to staff nurses in the care of behavioral health patients.


Subject(s)
Empathy , Voice , Humans , Hallucinations , Emotions , Hearing
10.
Nat Commun ; 15(1): 1873, 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-38472193

ABSTRACT

Voice disorders resulting from various pathological vocal fold conditions or postoperative recovery of laryngeal cancer surgeries, are common causes of dysphonia. Here, we present a self-powered wearable sensing-actuation system based on soft magnetoelasticity that enables assisted speaking without relying on the vocal folds. It holds a lightweighted mass of approximately 7.2 g, skin-alike modulus of 7.83 × 105 Pa, stability against skin perspiration, and a maximum stretchability of 164%. The wearable sensing component can effectively capture extrinsic laryngeal muscle movement and convert them into high-fidelity and analyzable electrical signals, which can be translated into speech signals with the assistance of machine learning algorithms with an accuracy of 94.68%. Then, with the wearable actuation component, the speech could be expressed as voice signals while circumventing vocal fold vibration. We expect this approach could facilitate the restoration of normal voice function and significantly enhance the quality of life for patients with dysfunctional vocal folds.


Subject(s)
Voice Disorders , Voice , Wearable Electronic Devices , Humans , Vocal Cords/physiology , Quality of Life , Voice/physiology
11.
Rev. logop. foniatr. audiol. (Ed. impr.) ; 44(1): [100330], Ene-Mar, 2024. ilus, tab
Article in English | IBECS | ID: ibc-231906

ABSTRACT

Introduction: To use a test in a language or culture other than the original it is necessary to carry out, in addition to its adaptation, a psychometric validation. This systematic review assesses the validation studies of the voice self-report scales in Spanish. Methods: A systematic review was performed searching ten databases. The assessment was carried out following the criteria proposed by Terwee et al. (2007) together with some specifically proposed for this study. Validation studies in Spanish of self-report voice scales published in indexed journals were included and the search was updated on February 2nd, 2023. Results: 15 studies that evaluated 12 scales were reviewed. It was verified that not all the validations were adjusted to the criteria used and that the properties to verify the metric strength of the validations were, in general, few. Conclusions: This systematic review shows that the included studies do not report much evidence of metric quality. It should be considered that different strategies have currently been developed to obtain more and better evidence of reliability and validity. Our contribution is to reflect on the usual practice of validation of self-report scales in Spanish language. The most important weakness is the possibility of using broader and more current evaluation protocols. We also propose to continue this work, completing it with a meta-analytic study.(AU)


Introducción: Para utilizar una prueba en una lengua o cultura distinta de la original es preciso realizar, además de su adaptación, una validación psicométrica. Esta revisión sistemática valora los estudios de validación de las escalas de autoinforme de voz en español. Método: Se realizó una revisión sistemática buscando en diez bases de datos. La valoración se llevó a cabo siguiendo los criterios propuestos por Terwee et al. (2007) junto con algunos específicamente propuestos para este trabajo. Se incluyeron estudios de validación en español de escalas de autoinforme publicados en revistas indexadas. La última búsqueda fue realizada el 2 de febrero de 2023. Resultados: Se revisaron 15 trabajos que evaluaron 12 escalas. Se comprobó que no todas las validaciones se ajustaron a los criterios utilizados y que las propiedades para comprobar la robustez métrica de estas fueron, por lo general, pocas.Conclusiones: Esta revisión sistemática muestra que los estudios incluidos no reportan demasiada evidencia de calidad métrica. Debería considerarse que en la actualidad se han desarrollado diferentes estrategias para obtener más y mejor evidencia de fiabilidad y validez. Nuestra contribución ha sido valorar la práctica de la validación de las escalas de autoinforme en lengua española. La más importante debilidad es la posibilidad de usar algún protocolo más amplio y actual. También proponemos continuar este trabajo con un estudio metaanalítico.(AU)


Subject(s)
Humans , Male , Female , Voice , Psychometrics , Speech, Language and Hearing Sciences , Self Report
12.
Acta Otorhinolaryngol Ital ; 44(1): 27-35, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38420719

ABSTRACT

Objective: The aim of this study was to compare the efficacy of voice therapy combined with standard anti-reflux therapy in reducing symptoms and signs of laryngopharyngeal reflux (LPR). Methods: A randomised clinical trial was conducted. Fifty-two patients with LPR diagnosed by 24 h multichannel intraluminal impedance-pH monitoring were randomly allocated in two groups: medical treatment (MT) and medical plus voice therapy (VT). Clinical symptoms and laryngeal signs were assessed at baseline and after 3 months of treatment with the Reflux Symptom Index (RSI), Reflux Finding Score (RFS), Voice Handicap Index (VHI) and GRBAS scales. Results: Groups had similar scores at baseline. At 3-month follow-up, a significant decrease in RSI and RFS total scores were found in both groups although it appeared to be more robust in the VT group. G and R scores of the GRBAS scale significantly improved after treatment in both groups, with better results in the VT group. The VHI total score at 3 months improved more in the VT group (VHI delta 9.54) than in the MT group (VHI delta 5.38) (p < 0.001). Conclusions: The addition of voice therapy to medications and diet appears to be more effective in improving treatment outcomes in subjects with LPR. Voice therapy warrants consideration in addition to medication and diet when treating patients with LPR.


Subject(s)
Laryngopharyngeal Reflux , Voice , Humans , Laryngopharyngeal Reflux/diagnosis , Laryngopharyngeal Reflux/drug therapy , Pilot Projects , Proton Pump Inhibitors/therapeutic use , Voice Quality
13.
Nurs Open ; 11(2): e2101, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38391105

ABSTRACT

AIM: Discussing the nurses' voice behaviour could support the managers in making the right decisions and solve problems. DESIGN: This was a discursive paper. METHODS: The discursive was based on reviewing the literature. RESULTS: Nurses play a critical role in offering useful constructive advice, which leads to management figuring out and solving problems immediately for the purpose of bettering the working environment. Therefore, we assert that trust in leadership and the leader-leader exchange system also plays a critical role in enforcing voice behaviour. Trust is a crucial aspect of voice behaviour, and integrated trust in leadership and leader-leader exchange as a possible practical suggestion for the fostering of voice behaviour are proposed. Nurse managers must maintain a sense of reciprocal moral obligation in order to nurture value-driven voice behaviour. It is important that open dialogue, active listening and trust in leadership exist. Nurse managers must consider ways to foster mutual trust, and support and enable nurses to use voice behaviour in everyday practice.


Subject(s)
Interprofessional Relations , Leadership , Nurses , Trust , Voice , Humans , Nurse Administrators
14.
Eur Arch Otorhinolaryngol ; 281(5): 2707-2716, 2024 May.
Article in English | MEDLINE | ID: mdl-38319369

ABSTRACT

PURPOSE: This cross-sectional study aimed to investigate the potential of voice analysis as a prescreening tool for type II diabetes mellitus (T2DM) by examining the differences in voice recordings between non-diabetic and T2DM participants. METHODS: 60 participants diagnosed as non-diabetic (n = 30) or T2DM (n = 30) were recruited on the basis of specific inclusion and exclusion criteria in Iran between February 2020 and September 2023. Participants were matched according to their year of birth and then placed into six age categories. Using the WhatsApp application, participants recorded the translated versions of speech elicitation tasks. Seven acoustic features [fundamental frequency, jitter, shimmer, harmonic-to-noise ratio (HNR), cepstral peak prominence (CPP), voice onset time (VOT), and formant (F1-F2)] were extracted from each recording and analyzed using Praat software. Data was analyzed with Kolmogorov-Smirnov, two-way ANOVA, post hoc Tukey, binary logistic regression, and student t tests. RESULTS: The comparison between groups showed significant differences in fundamental frequency, jitter, shimmer, CPP, and HNR (p < 0.05), while there were no significant differences in formant and VOT (p > 0.05). Binary logistic regression showed that shimmer was the most significant predictor of the disease group. There was also a significant difference between diabetes status and age, in the case of CPP. CONCLUSIONS: Participants with type II diabetes exhibited significant vocal variations compared to non-diabetic controls.


Subject(s)
Diabetes Mellitus, Type 2 , Voice , Humans , Voice Quality , Speech Acoustics , Diabetes Mellitus, Type 2/complications , Cross-Sectional Studies , Speech Production Measurement , Acoustics
15.
J Speech Lang Hear Res ; 67(3): 802-810, 2024 Mar 11.
Article in English | MEDLINE | ID: mdl-38416067

ABSTRACT

PURPOSE: This study was a modest beginning to determine dominance and entrainment between three soft tissues in the larynx that can be set into flow-induced oscillation and act as sound sources. The hypothesis was that they interact as coupled oscillators with observable bifurcations as energy is exchanged between them. METHODOLOGY: The true vocal folds, the ventricular (false) folds, and the aryepiglottic folds were part of a soft-walled airway that produced airflow for sound production. The methodology was computational, based on a simplified Navier-Stokes solution of convective and compressible airflow in a variable-geometry airway. RESULTS: Three serially connected sources could all produce flow-induced self-oscillation with soft wall tissue and small cross-sectional area. When the glottal cross-sectional areas were similar, bifurcations such as subharmonics, delayed voice onset, and aphonia occurred in the coupled oscillations. CONCLUSIONS: Closely spaced sound sources in the larynx are highly interactive. They appear to entrain to the source that has the combined advantage of small cross-sectional glottal area and proximity to a downstream vocal tract that supports oscillation with acoustic inertance.


Subject(s)
Larynx , Voice , Humans , Vocal Cords , Glottis , Sound , Phonation
16.
JASA Express Lett ; 4(2)2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38350076

ABSTRACT

Human voice directivity shows horizontal asymmetries caused by the shape of the lips or the position of the tooth and tongue during vocalization. This study presents and analyzes the asymmetries of voice directivity datasets of 23 different phonemes. The asymmetries were determined from datasets obtained in previous measurements with 13 subjects in a surrounding spherical microphone array. The results show that asymmetries are inherent to human voice production and that they differ between the phoneme groups with the strongest effect on the [s], the [l], and the nasals [m], [n], and [ŋ]. The least asymmetries were found for the plosives.


Subject(s)
Voice , Humans , Tongue
17.
J Appl Behav Anal ; 57(2): 444-454, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38379177

ABSTRACT

Response interruption and redirection (RIRD) is a common treatment for automatically reinforced vocal stereotypy; it involves the contingent presentation of task instructions. Tasks that are included in RIRD are typically selected based on caregiver report, which may affect the efficacy of RIRD. The purpose of the current study was to evaluate the role of task preference in the efficacy of RIRD for four participants who engaged in vocal stereotypy. We conducted task-preference assessments and selected tasks of varying preferences to include in RIRD. For three out of four participants, the results showed that RIRD with higher preference tasks was not effective at reducing vocal stereotypy, whereas RIRD with lower preference tasks was effective for all participants.


Subject(s)
Stereotypic Movement Disorder , Voice , Humans , Behavior Therapy/methods , Stereotyped Behavior/physiology , Stereotypic Movement Disorder/therapy
18.
J Voice ; 38(2): 251-252, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38403488
19.
Eur Arch Otorhinolaryngol ; 281(5): 2523-2529, 2024 May.
Article in English | MEDLINE | ID: mdl-38421393

ABSTRACT

OBJECTIVE: This study aimed to investigate the impact of the implant's vertical location during Type 1 Thyroplasty (T1T) on acoustics and glottal aerodynamics using excised canine larynx model, providing insights into the optimal technique for treating unilateral vocal fold paralysis (UVFP). METHODS: Measurements were conducted in six excised canine larynges using Silastic implants. Two implant locations, glottal and infraglottal, were tested for each larynx at low and high subglottal pressure levels. Acoustic and intraglottal flow velocity field measurements were taken to assess vocal efficiency (VE), cepstral peak prominence (CPP), and the development of intraglottal vortices. RESULTS: The results indicated that the implant's vertical location significantly influenced vocal efficiency (p = 0.045), with the infraglottal implant generally yielding higher VE values. The effect on CPP was not statistically significant (p = 0.234). Intraglottal velocity field measurements demonstrated larger glottal divergence angles and stronger vortices with the infraglottal implant. CONCLUSION: The findings suggest that medializing the paralyzed fold at the infraglottal level rather than the glottal level can lead to improved vocal efficiency. The observed larger divergence angles and stronger intraglottal vortices with infraglottal medialization may enhance voice outcomes in UVFP patients. These findings have important implications for optimizing T1T procedures and improving voice quality in individuals with UVFP. Further research is warranted to validate these results in clinical settings.


Subject(s)
Laryngoplasty , Larynx , Vocal Cord Paralysis , Voice , Humans , Animals , Dogs , Larynx/surgery , Glottis/surgery , Vocal Cord Paralysis/surgery , Acoustics , Vocal Cords/surgery
20.
J Acoust Soc Am ; 155(2): 1071-1085, 2024 02 01.
Article in English | MEDLINE | ID: mdl-38341737

ABSTRACT

Children's speech understanding is vulnerable to indoor noise and reverberation: e.g., from classrooms. It is unknown how they develop the ability to use temporal acoustic cues, specifically amplitude modulation (AM) and voice onset time (VOT), which are important for perceiving distorted speech. Through three experiments, we investigated the typical development of AM depth detection in vowels (experiment I), categorical perception of VOT (experiment II), and consonant identification (experiment III) in quiet and in speech-shaped noise (SSN) and mild reverberation in 6- to 14-year-old children. Our findings suggested that AM depth detection using a naturally produced vowel at the rate of the fundamental frequency was particularly difficult for children and with acoustic distortions. While the VOT cue salience was monotonically attenuated with increasing signal-to-noise ratio of SSN, its utility for consonant discrimination was completely removed even under mild reverberation. The reverberant energy decay in distorting critical temporal cues provided further evidence that may explain the error patterns observed in consonant identification. By 11-14 years of age, children approached adult-like performance in consonant discrimination and identification under adverse acoustics, emphasizing the need for good acoustics for younger children as they develop auditory skills to process distorted speech in everyday listening environments.


Subject(s)
Speech Perception , Voice , Adult , Child , Humans , Adolescent , Noise/adverse effects , Acoustics , Speech
SELECTION OF CITATIONS
SEARCH DETAIL
...